pacman::p_load(sf,st,arrow,lubridate,tidyverse,raster,tmap,ggplot2, patchwork,spatstat,spNetwork,gifski)1.0 Introduction
Human mobility, the spatial-temporal dynamics of human movements, serves as a critical reflection of societal patterns and human behaviors. With the advancement and pervasiveness of Information and Communication Technologies (ICT) in our daily life, especially smart phone, a large volume of data related to human mobility have been collected. These data provide valuable insights into understanding how individuals and populations move within and between different geographical locations. By using appropriate GIS analysis methods, these data can turn into valuable inisghts for predicting future mobility trrends and developing more efficient and sustainable strategies for managing urban mobility.
In this study, I will apply Spatial Point Patterns Analysis methods to discover the geographical and spatio-temporal distribution of Grab hailing services locations in Singapore. In order to determine the geographical and spatio-temporal patterns of the Grab hailing services, I will develop traditional Kernel Density Estimation (KDE) and Temporal Network Kernel Density Estimation (TNKDE). KDE layers will help identify the areas with high concentration of Grab hailing services, providing insights into the demand and popularity of these services in different parts of Singapore. TNKDE, on the other hand, will allow for analysis of how the distribution of Grab hailing services changes over time, revealing temporal patterns and trends in their usage. These spatial and spatio-temporal analyses will contribute to a better understanding of the dynamics and effectiveness of Grab’s mobility services in Singapore.
2.0 Literature Review of Spatial Point Pattern Analysis
Spatial point pattern analysis is concerned with description, statistical characterization, modeling and visulisation of point patterns over space and making inference about the process that could have generated an observed pattern (Boots & Getis, 1988 ,Rey et al., 2023; Pebesma & Bivand, 2023). According to this theory, empirical spatial distribution of points in our daily life are not controlled by sampling, but a result of an underlying geographically-continuous process (Rey et al., 2023). For example, an COVID-19 cluster did not happen by chance, but due to a spatial process of close-contact infection.
When analysing real-world spatial points, it is important to analyse whether the observed spatial points are randomly distributed or patterned due to a process or interaction (Floch et al., 2018). In “complete random” distribution, points are located everywhere with the same probability and independently of each other. On the other hand, the spatial points can be clustered or dispersed due to an underlying point process. However, it is challenging to use heuristic observation and intuitive interpretation to detect whether a spatial point pattern exists (Baddeley et al., 2015; Floch et al., 2018). Hence, spatial point pattern analysis can be used to detect the spatial concentration or dispersion phenomena.

When analysing and interpreting the properties of a point pattern, these properties can be categorized into two: (a) first-order properties and (b) second-order properties (Yuan et al., 2020; Gimond, 2023). First-order properties concern with the characteristics of individual point locations and their variations of their density across space (Gimond, 2023). Under this conception, observations vary from point to point due to changes in the underlying property. Second-order properties focus on not only individual points, but also the interactions between points and their influences on one another (Gimond, 2023). Under this conception, observations vary from place to place due to interaction effects between observations. First-order properties of point patterns are mostly addressed by density-based techniques, such as quadrat analysis and kernel density estimation, whereas, distance-based techniques, such nearest neighbour index and K-functions, are often used to analyse second-order properties since they take into account the distance between point pairs (Yuan et al., 2020; Gimond, 2023).
3.0 Importing Packages
Before we start the exercise, we will need to import necessary R packages first. We will use the following packages:
sf: provides a standardised way to encode spatial vector data in R environment, facilitating spatial data operations and analysis.st: creats simple features from numeric vectors, matrices, or lists, enabling the representation and manipulation of spatial structures in R.arrow: for reading and writing Apache Parquet files, a columnar storage file format optimized for use with big data processing frameworks.lubridate: for tackling with temporal data (dates and times), providing tools to parse, manipulate, and do arithmetic with date-time objects.spatstat: A package for statistical analysis of spatial data, specifically Spatial Point Pattern Analysis. This package was provided by Baddeley, Turner and Ruback (2015) and gives a comprehensive list of functions to perform 1st- and 2nd-order spatial point patterns analysis and derive kernel density estimation (KDE) layer.tidyverse: a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structure.raster: reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster). In this hands-on exercise, it will be used to convert image output generate by spatstat into raster format.tmap: Packages used for creating static and interactive visualisations summary statistics and KDE layers.spatstat: for spatial statistics with a strong focus on analysing spatial point patterns.spNetwork: provides several functions to perform spatial analysis on network, including network kernel density estimation and point pattern analysis.gifski: for converting images to GIF animations, useful for creating dynamic visualizations and presentations.
4.0 Importing Datasets into R Environment
4.1 Datasets
In this study, we will use Grab-Posisi dataset, which is a comprehensive GPS trajectory dataset for car-hailing services in Southeast Asia. Apart from the time and location of the object, GPS trajectories are also characterised by other parameters such as speed, the headed direction, the area and distance covered during its travel, and the travelled time. Thus, the trajectory patterns from users GPS data are a valuable source of information for a wide range of urban applications, such as solving transportation problems, traffic prediction, and developing reasonable urban planning.
Moreover, we will also use OpenStreetMap dataset, which is an open-sourced geospatial dataset including shapefiles of important layers including road networks, forests, building footprints and many other points of interest.
To extract the Singapore boundary, we will use Master Plan 2019 Subzone Boundary (No Sea), provided by data.gov.sg.
4.2 Importing Grab-Posisi Dataset
Each trajectory in Grab-Posisi dataset is serialised in a file in Apache Parquet format. The Singapore portion of the dataset is packaged into a total of 10 Parquet files.
Firstly, we will use read_parquet function from arrow package, which allows us to read Parquet files into R environment as a data frame (more specifically, a tibble).
df <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00000.snappy.parquet',as_data_frame = TRUE)
df1 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00001.snappy.parquet',as_data_frame = TRUE)
df2 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00002.snappy.parquet',as_data_frame = TRUE)
df3 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00003.snappy.parquet',as_data_frame = TRUE)
df4 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00004.snappy.parquet',as_data_frame = TRUE)
df5 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00005.snappy.parquet',as_data_frame = TRUE)
df6 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00006.snappy.parquet',as_data_frame = TRUE)
df7 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00007.snappy.parquet',as_data_frame = TRUE)
df8 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00008.snappy.parquet',as_data_frame = TRUE)
df9 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00009.snappy.parquet',as_data_frame = TRUE)To consolidate all trajectory instances into a single dataframe, we’ll vertically bind all 10 imported dataframes using bind_rows() function from tidyverse package.
df_trajectories <- bind_rows(df,df1,df2,df3,df4,df5,df6,df7,df8,df9)To get a quick overview of the dataset, we’ll first check the number of trajectory instances using dim() function. Then, we’ll use head() function to quickly scan through the data columns and values
dim(df_trajectories)[1] 30329685 9
head(df_trajectories)# A tibble: 6 × 9
trj_id driving_mode osname pingtimestamp rawlat rawlng speed bearing accuracy
<chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <int> <dbl>
1 70014 car android 1554943236 1.34 104. 18.9 248 3.9
2 73573 car android 1555582623 1.32 104. 17.7 44 4
3 75567 car android 1555141026 1.33 104. 14.0 34 3.9
4 1410 car android 1555731693 1.26 104. 13.0 181 4
5 4354 car android 1555584497 1.28 104. 14.8 93 3.9
6 32630 car android 1555395258 1.30 104. 23.2 73 3.9
From the result above, we can see that the dataset includes a total of 30329685 trajectory instances, each with a total of 9 columns as follows:
| Column Name | Data Type | Remark |
|---|---|---|
| trj_id | chr | Trajectory ID |
| driving_mode | chr | Mode of Driving |
| osname | chr | |
| pingtimestamp | int | Data Recording Timestamp |
| rawlat | num | Latitude Value (WGS-84) |
| rawlng | num | Longitude Value (WGS-84) |
| speed | num | Speed |
| bearing | int | Bearing |
| accuracy | num | Accuracy |
From the above table, it is seen that the pingtimestamp is recorded as int. We need to convert this data to proper datetime format to derive meaningful temporal insights of the data. To do so, we will use as_datetime() function from lubridate package.
df_trajectories$pingtimestamp <- as_datetime(df_trajectories$pingtimestamp)4.3 Importing OpenStreetMap road data for Malaysia, Singapore and Brunei
The gis_osm_roads_free_1 dataset, which we downloaded from OpenStreetMap, is in ESRI shapefile format. To use this data in an R-environment, we need to import it as an sf object. We can do this using the st_read() function of the sf package. This function reads the shapefile data and returns an sf object that can be used for further analysis.
osm_road_sf <- st_read(dsn = "~/IS415-GAA/data/geospatial",
layer = "gis_osm_roads_free_1") %>% st_transform(crs = 3414)4.4 Importing Singapore Master Plan Planning Subzone boundary data
The MP14_SUBZONE_WEB_PL dataset, which we downloaded from data.gov.sg, is in ESRI shapefile format. To use this data in an R-environment, we need to import it as an sf object. We can do this using the st_read() function of the sf package. This function reads the shapefile data and returns an sf object that can be used for further analysis.
mpsz_sf <- st_read(dsn = "~/IS415-GAA/data/geospatial",
layer = "MPSZ-2019") %>% st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`/Users/khantminnaing/IS415-GAA/data/geospatial' using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
In the code chunk above, we use %>% operator is used to pipe the output of st_read() to the st_transform() function. Since the dataset we are using is the Singapore boundary, we need to assign the standard coordinate reference system for Singapore, which is SVY21 (EPSG:3414). st_transform() function transforms the coordinate reference system of the sf object to 3414.
After importing the dataset, we will plot it to see how it looks. The plot() function is used to plot the geometry of the sf object. The st_geometry() function is used to extract the geometry of the mpsz_sf object.
plot(st_geometry(mpsz_sf))
5.0 Data Wrangling
Data wrangling is the process of converting and transforming raw data into a usable form and is carried out prior to conducting any data analysis.
5.1 Extracting Trip Starting Locations and Temporal Data Values from Grab-Posisi dataset
Firstly, we will extract trip starting locations for all unqiue trajectories in the dataset and store them to a new df named origin_df. We are also interested in obtaining valuable temporal data such as the day of the week, the hour, and the date (yy-mm-dd). To do so, we will use the following functions from lubridate package, and add the newly derived values as columns to origin_df.
wday: allows us to get days component of a date-timehour: allows us to get hours component of a date-timemday: allows us to parse dates with year, month, and day components
origin_df <- df_trajectories %>%
group_by(trj_id) %>%
arrange(pingtimestamp) %>%
filter(row_number()==1) %>%
mutate(weekday = wday(pingtimestamp,
label=TRUE,
abbr=TRUE),
pickup_hr = factor(hour(pingtimestamp)),
day = factor(mday(pingtimestamp)))5.2 Extracting Trip Ending Locations and Temporal Data Values from Grab-Posisi dataset
Similar to what we did in previous session, we are also interested to extract trip ending locations and associated temporal data into a new df called destination_df. We will use the same functions from previous session here.
destination_df <- df_trajectories %>%
group_by(trj_id) %>%
arrange(desc(pingtimestamp)) %>%
filter(row_number()==1) %>%
mutate(weekday = wday(pingtimestamp,
label=TRUE,
abbr=TRUE),
dropoff_hr = factor(hour(pingtimestamp)),
day = factor(mday(pingtimestamp)))arrange() function sort the timestamps in ascending order by default. Hence, this default order is applied to origin_df. However, for destination_df, the arrange(desc()) argument is used to modify the default order to descending.
5.3 Converting to sf tibble data.frame
origin_sf <- st_as_sf(origin_df,
coords = c("rawlng", "rawlat"),
crs = 4326) %>%
st_transform(crs = 3414)
destination_sf <- st_as_sf(destination_df,
coords = c("rawlng", "rawlat"),
crs = 4326) %>%
st_transform(crs = 3414)5.4 Creating a CoastalLine of Singapore
The original mpsz_sf dataset we imported include information of all URA Master Plan planning area boundaries. However, for this analysis, we only need the national-level boundary of Singapore. Hence, we will need to union all the subzone boundaries to one single polygon boundary. Also, Grab ride-hailing service is only available on the main Singapore islands. Hence, we will need to remove outer islands which Grab service is not available. In particular, we will remove the following planning subzones: NORTH-EASTERN ISLANDS, SOUTHERN GROUP, SUDONG & SEMAKAU.
We can remove these subzones using the subset() function. The subset() function is used to extract rows from a data frame that meet certain conditions.
northeasten.islands <- subset(mpsz_sf, mpsz_sf$SUBZONE_N == "NORTH-EASTERN ISLANDS")
southern.islands <- subset(mpsz_sf, mpsz_sf$SUBZONE_N == "SOUTHERN GROUP")
sudong <- subset(mpsz_sf, mpsz_sf$SUBZONE_N == "SUDONG")
semakau <- subset(mpsz_sf,mpsz_sf$SUBZONE_N == "SEMAKAU")
outerislands <- dplyr::bind_rows(list(northeasten.islands,southern.islands,sudong,semakau))In the code chunk above, we first created four new data frames called northeasten.islands, southern.islands, sudong, and semakau by selecting rows from mpsz_sf where the value in the SUBZONE_N column matches the corresponding value.
After that, we used bind_rows() function from the dplyr package to combine these four data frames into a single data frame called outerislands.
After importing the dataset, we will plot it to see how it looks.
plot(st_geometry(outerislands))
As mentioned earlier, we only need to get national-level boundary of Singapore, without outer islands. To do so, we will need to process the mpsz_sf layer to achieve the outcome. - We will first use st_union() function from the sf package to combine the geometries of mpsz_sf and outerislands sf objects into a single geometry each. - Next, we will use st_difference() function then removes the overlapping areas between the two geometries. - Finally, we will store the non-overlapping areas into a new sf objected called sg_sf.
sg_sf <- st_difference(st_union(mpsz_sf), st_union(outerislands))To assess whether the geometry of the newly created sg_sf matches our intended outcome, we will plot it out.
plot(st_geometry(sg_sf))
5.5 Extracting Road Layers within Singapore
As we have seen in Section 4.3., osm_road_sf dataset includes road networks from not only Singapore, but also Malaysia and Brunei. However, our analysis is focused on Singapore. Hence, we will need to remove unecessary data rows. To do so, we will
sg_road_sf <- st_intersection(osm_road_sf,lsg_sf)Next, we will look at the classification of road networks as provided by OpenStreetMap.
unique(sg_road_sf$fclass) [1] "primary" "residential" "tertiary" "footway"
[5] "service" "secondary" "motorway" "motorway_link"
[9] "trunk" "trunk_link" "primary_link" "pedestrian"
[13] "living_street" "unclassified" "steps" "track_grade2"
[17] "track" "secondary_link" "cycleway" "path"
[21] "tertiary_link" "track_grade1" "track_grade3" "unknown"
[25] "track_grade5" "bridleway" "track_grade4"
Looking at the road classification, it is observed that not all categories are relevant to our analysis, which is primarily concerned with driving networks where taxis can facilitate pick-ups or drop-offs. Hence, we will implement a filtering process on the dataset to exclude road segments that fall outside the scope of our analysis.
Firstly we will specify the road classes that we want to retain.
driving_classes <- c("primary", "primary_link", "residential", "secondary", "secondary_link", "service", "tertiary", "tertiary_link") Next, we will filter sg_road_sf object to remove all the rows that does not have our desired f_class attribute value.
sg_driving_sf <- sg_road_sf %>%
filter(fclass %in% driving_classes)
unique(sg_driving_sf$fclass)[1] "primary" "residential" "tertiary" "service"
[5] "secondary" "primary_link" "secondary_link" "tertiary_link"
Why motorway and motorway_link classes are not included?
According to OpenStreetMap, fclass motorway refers to expressway. In Singapore, stopping or parking a vehicle on an expressway is illegal under the Road Traffic Act. Hence, motorway (and motorway_link) are not relevant for network constraint kernel density estimation (NKDE) analysis that we will carry out later.
Now that we have filterd out the dataset, we will now plot to see the driving road network of Singapore using tmap.
tmap_mode("plot")
tm_shape(sg_sf) +
tm_polygons() +
tm_shape(sg_driving_sf) +
tm_lines(col="fclass", palette ="viridis") +
tm_layout(main.title = "Road Network in Singapore",
main.title.position = "center",
main.title.size = 1.2,
legend.outside = TRUE,
frame = TRUE) +
tm_borders(alpha = 0.5)
5.5 Converting the Simple Features to Planar Point Pattern Object
In order to use the capabilities of spatstat packahe, a spatial dataset should be converted into an object of class planar point pattern ppp (Baddeley et al., 2015). A point pattern object contains the spatial coordinates of the points, the marks attached to the points (if any), the window in which the points were observed, and the name of the unit of length for the spatial coordinates. s. Thus, a single object of class ppp contains all the information required to perform spatial point pattern analysis.
In previous section, we have created sf objects of Grab trajectory origin and destination points. Now, we will convert them into ppp objects using as.ppp() function from spatstat package.
origin_ppp <- as.ppp(st_coordinates(origin_sf), st_bbox(origin_sf))
par(mar = c(0,0,1,0))
plot(origin_ppp)
The code chunk above converts the origin_sf object to a point pattern object of class ppp. st_coordinates() function is used to extract the coordinates of the origin_sf object and st_bbox() function is used to extract the bounding box of the origin_sf object. The resulting object origin_ppp is a point pattern object of class ppp.
destination_ppp <- as.ppp(st_coordinates(destination_sf), st_bbox(destination_sf))
par(mar = c(0,0,1,0))
plot(destination_ppp)
5.6 Handling Data Errors
Before going striaght into analysis, we will need to a quick look at the summary statistics of the newly created ppp objects. This is an important step to ensure that the data is free of errors and that a reliable analysis can be performed.
5.6.1 Data Error Handling for origin_ppp
We will use summary() function to get summary information of origin_ppp object.
summary(origin_ppp)Planar point pattern: 28000 points
Average intensity 2.473666e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
(46220 x 24490 units)
Window area = 1131920000 square units
We can also check if there is any duplicated points in origin_ppp object using any(duplicated() function.
any(duplicated(origin_ppp))[1] FALSE
The code output is FALSE, which means there are no duplication of point coordaintes in the origin_ppp object.
Why do we need to check duplication?
When analyzing spatial point processes, it is important to avoid duplication of points. This is because statistical methodology for spatial point processes is based largely on the assumption that processes are simple, i.e., that points of the process can never be coincident. When the data have coincident points, some statistical procedures designed for simple point processes will be severely affected (Baddeley et al., 2015).
5.6.2 Data Error Handling for destination_ppp
We will use summary() function to get summary information of destination_ppp object.
summary(destination_ppp)Planar point pattern: 28000 points
Average intensity 2.493661e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [3637.21, 49870.63] x [25221.3, 49507.79] units
(46230 x 24290 units)
Window area = 1122850000 square units
We can also check if there is any duplicated points in destination_ppp object using any(duplicated() function.
any(duplicated(destination_ppp))[1] FALSE
The code output is FALSE, which means there are no duplication of point coordinates in the destination_ppp object.
5.7 Creating Observation Windows
Many data types in spatstat require us to specify the region of space inside which the data were observed. This is the observation window and it is represented by an object of class owin. In this analysis, our study area is Singapore, hence we will use Singapore boundary as the observation window for spatial point pattern analysis.
In Section 5.4, we have already created the sg_sf object, which represents the Singapore boundary (without outer islands). To convert this sf object to owin object, we will use as.owin() function from spatstat package.
sg_owin <- as.owin(sg_sf)
plot.owin(sg_owin)
We will use summary() function to get summary information of sg_owin object.
summary(sg_owin)Window: polygonal boundary
56 separate polygons (36 holes)
vertices area relative.area
polygon 1 15307 7.00834e+08 9.92e-01
polygon 2 285 1.61128e+06 2.28e-03
polygon 3 27 1.50315e+04 2.13e-05
polygon 4 (hole) 41 -4.01660e+04 -5.69e-05
polygon 5 (hole) 317 -5.11280e+04 -7.24e-05
polygon 6 (hole) 3 -4.14099e-04 -5.86e-13
polygon 7 30 2.80002e+04 3.97e-05
polygon 8 (hole) 4 -2.86396e-01 -4.06e-10
polygon 9 (hole) 3 -1.81439e-04 -2.57e-13
polygon 10 (hole) 3 -8.68789e-04 -1.23e-12
polygon 11 (hole) 3 -5.99535e-04 -8.49e-13
polygon 12 (hole) 3 -3.04561e-04 -4.31e-13
polygon 13 (hole) 3 -4.46076e-04 -6.32e-13
polygon 14 (hole) 3 -3.39794e-04 -4.81e-13
polygon 15 (hole) 3 -4.52043e-05 -6.40e-14
polygon 16 (hole) 3 -3.90173e-05 -5.53e-14
polygon 17 (hole) 3 -9.59850e-05 -1.36e-13
polygon 18 (hole) 4 -2.54488e-04 -3.60e-13
polygon 19 (hole) 4 -4.28453e-01 -6.07e-10
polygon 20 (hole) 4 -2.18616e-04 -3.10e-13
polygon 21 (hole) 5 -2.44411e-04 -3.46e-13
polygon 22 (hole) 5 -3.64686e-02 -5.16e-11
polygon 23 71 8.18750e+03 1.16e-05
polygon 24 (hole) 6 -8.37554e-01 -1.19e-09
polygon 25 (hole) 38 -7.79904e+03 -1.10e-05
polygon 26 (hole) 3 -3.41897e-05 -4.84e-14
polygon 27 (hole) 3 -3.65499e-03 -5.18e-12
polygon 28 (hole) 3 -4.95057e-02 -7.01e-11
polygon 29 91 1.49663e+04 2.12e-05
polygon 30 (hole) 3 -3.79135e-02 -5.37e-11
polygon 31 (hole) 5 -2.92235e-04 -4.14e-13
polygon 32 (hole) 3 -7.43616e-06 -1.05e-14
polygon 33 (hole) 270 -1.21455e+03 -1.72e-06
polygon 34 (hole) 19 -4.39650e+00 -6.23e-09
polygon 35 (hole) 35 -1.38385e+02 -1.96e-07
polygon 36 (hole) 23 -1.99656e+01 -2.83e-08
polygon 37 40 1.38607e+04 1.96e-05
polygon 38 (hole) 41 -6.00381e+03 -8.50e-06
polygon 39 (hole) 7 -1.40546e-01 -1.99e-10
polygon 40 (hole) 11 -8.36705e+01 -1.18e-07
polygon 41 (hole) 3 -2.33435e-03 -3.31e-12
polygon 42 45 2.51218e+03 3.56e-06
polygon 43 139 3.22293e+03 4.56e-06
polygon 44 148 3.10395e+03 4.40e-06
polygon 45 (hole) 4 -1.72650e-04 -2.44e-13
polygon 46 75 1.73526e+04 2.46e-05
polygon 47 83 5.28920e+03 7.49e-06
polygon 48 106 3.04104e+03 4.31e-06
polygon 49 264 1.50631e+06 2.13e-03
polygon 50 71 5.63061e+03 7.97e-06
polygon 51 10 1.99717e+02 2.83e-07
polygon 52 (hole) 3 -1.37223e-02 -1.94e-11
polygon 53 487 2.06117e+06 2.92e-03
polygon 54 65 8.42861e+04 1.19e-04
polygon 55 47 3.82087e+04 5.41e-05
polygon 56 22 6.74651e+03 9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
(53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46
5.8 Combining ppp objects and owin object
In section 5.5, we have created two ppp objects - origin_ppp and destination_ppp, each representing the spatial points of Grab trajectory origin and destination. In section 5.7, we have created a owin object called sg_owin, which represent the observation window of our analysis.
The observation window sg_owin and the point pattern origin_ppp or destination_ppp can be combined, so that the custom window replaces the default ractangular extent (as seen in section 5.5).
origin_ppp_sg = origin_ppp[sg_owin]
destination_ppp_sg = destination_ppp[sg_owin]
par(mar = c(0,0,1,0))
plot(origin_ppp_sg)
plot(destination_ppp_sg)
We will use summary() function to get summary information of the newly created origin_ppp_sg object and destination_ppp_sg object.
summary(origin_ppp_sg)Planar point pattern: 28000 points
Average intensity 3.965129e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: polygonal boundary
56 separate polygons (36 holes)
vertices area relative.area
polygon 1 15307 7.00834e+08 9.92e-01
polygon 2 285 1.61128e+06 2.28e-03
polygon 3 27 1.50315e+04 2.13e-05
polygon 4 (hole) 41 -4.01660e+04 -5.69e-05
polygon 5 (hole) 317 -5.11280e+04 -7.24e-05
polygon 6 (hole) 3 -4.14099e-04 -5.86e-13
polygon 7 30 2.80002e+04 3.97e-05
polygon 8 (hole) 4 -2.86396e-01 -4.06e-10
polygon 9 (hole) 3 -1.81439e-04 -2.57e-13
polygon 10 (hole) 3 -8.68789e-04 -1.23e-12
polygon 11 (hole) 3 -5.99535e-04 -8.49e-13
polygon 12 (hole) 3 -3.04561e-04 -4.31e-13
polygon 13 (hole) 3 -4.46076e-04 -6.32e-13
polygon 14 (hole) 3 -3.39794e-04 -4.81e-13
polygon 15 (hole) 3 -4.52043e-05 -6.40e-14
polygon 16 (hole) 3 -3.90173e-05 -5.53e-14
polygon 17 (hole) 3 -9.59850e-05 -1.36e-13
polygon 18 (hole) 4 -2.54488e-04 -3.60e-13
polygon 19 (hole) 4 -4.28453e-01 -6.07e-10
polygon 20 (hole) 4 -2.18616e-04 -3.10e-13
polygon 21 (hole) 5 -2.44411e-04 -3.46e-13
polygon 22 (hole) 5 -3.64686e-02 -5.16e-11
polygon 23 71 8.18750e+03 1.16e-05
polygon 24 (hole) 6 -8.37554e-01 -1.19e-09
polygon 25 (hole) 38 -7.79904e+03 -1.10e-05
polygon 26 (hole) 3 -3.41897e-05 -4.84e-14
polygon 27 (hole) 3 -3.65499e-03 -5.18e-12
polygon 28 (hole) 3 -4.95057e-02 -7.01e-11
polygon 29 91 1.49663e+04 2.12e-05
polygon 30 (hole) 3 -3.79135e-02 -5.37e-11
polygon 31 (hole) 5 -2.92235e-04 -4.14e-13
polygon 32 (hole) 3 -7.43616e-06 -1.05e-14
polygon 33 (hole) 270 -1.21455e+03 -1.72e-06
polygon 34 (hole) 19 -4.39650e+00 -6.23e-09
polygon 35 (hole) 35 -1.38385e+02 -1.96e-07
polygon 36 (hole) 23 -1.99656e+01 -2.83e-08
polygon 37 40 1.38607e+04 1.96e-05
polygon 38 (hole) 41 -6.00381e+03 -8.50e-06
polygon 39 (hole) 7 -1.40546e-01 -1.99e-10
polygon 40 (hole) 11 -8.36705e+01 -1.18e-07
polygon 41 (hole) 3 -2.33435e-03 -3.31e-12
polygon 42 45 2.51218e+03 3.56e-06
polygon 43 139 3.22293e+03 4.56e-06
polygon 44 148 3.10395e+03 4.40e-06
polygon 45 (hole) 4 -1.72650e-04 -2.44e-13
polygon 46 75 1.73526e+04 2.46e-05
polygon 47 83 5.28920e+03 7.49e-06
polygon 48 106 3.04104e+03 4.31e-06
polygon 49 264 1.50631e+06 2.13e-03
polygon 50 71 5.63061e+03 7.97e-06
polygon 51 10 1.99717e+02 2.83e-07
polygon 52 (hole) 3 -1.37223e-02 -1.94e-11
polygon 53 487 2.06117e+06 2.92e-03
polygon 54 65 8.42861e+04 1.19e-04
polygon 55 47 3.82087e+04 5.41e-05
polygon 56 22 6.74651e+03 9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
(53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46
summary(destination_ppp_sg)Planar point pattern: 27997 points
Average intensity 3.964704e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: polygonal boundary
56 separate polygons (36 holes)
vertices area relative.area
polygon 1 15307 7.00834e+08 9.92e-01
polygon 2 285 1.61128e+06 2.28e-03
polygon 3 27 1.50315e+04 2.13e-05
polygon 4 (hole) 41 -4.01660e+04 -5.69e-05
polygon 5 (hole) 317 -5.11280e+04 -7.24e-05
polygon 6 (hole) 3 -4.14099e-04 -5.86e-13
polygon 7 30 2.80002e+04 3.97e-05
polygon 8 (hole) 4 -2.86396e-01 -4.06e-10
polygon 9 (hole) 3 -1.81439e-04 -2.57e-13
polygon 10 (hole) 3 -8.68789e-04 -1.23e-12
polygon 11 (hole) 3 -5.99535e-04 -8.49e-13
polygon 12 (hole) 3 -3.04561e-04 -4.31e-13
polygon 13 (hole) 3 -4.46076e-04 -6.32e-13
polygon 14 (hole) 3 -3.39794e-04 -4.81e-13
polygon 15 (hole) 3 -4.52043e-05 -6.40e-14
polygon 16 (hole) 3 -3.90173e-05 -5.53e-14
polygon 17 (hole) 3 -9.59850e-05 -1.36e-13
polygon 18 (hole) 4 -2.54488e-04 -3.60e-13
polygon 19 (hole) 4 -4.28453e-01 -6.07e-10
polygon 20 (hole) 4 -2.18616e-04 -3.10e-13
polygon 21 (hole) 5 -2.44411e-04 -3.46e-13
polygon 22 (hole) 5 -3.64686e-02 -5.16e-11
polygon 23 71 8.18750e+03 1.16e-05
polygon 24 (hole) 6 -8.37554e-01 -1.19e-09
polygon 25 (hole) 38 -7.79904e+03 -1.10e-05
polygon 26 (hole) 3 -3.41897e-05 -4.84e-14
polygon 27 (hole) 3 -3.65499e-03 -5.18e-12
polygon 28 (hole) 3 -4.95057e-02 -7.01e-11
polygon 29 91 1.49663e+04 2.12e-05
polygon 30 (hole) 3 -3.79135e-02 -5.37e-11
polygon 31 (hole) 5 -2.92235e-04 -4.14e-13
polygon 32 (hole) 3 -7.43616e-06 -1.05e-14
polygon 33 (hole) 270 -1.21455e+03 -1.72e-06
polygon 34 (hole) 19 -4.39650e+00 -6.23e-09
polygon 35 (hole) 35 -1.38385e+02 -1.96e-07
polygon 36 (hole) 23 -1.99656e+01 -2.83e-08
polygon 37 40 1.38607e+04 1.96e-05
polygon 38 (hole) 41 -6.00381e+03 -8.50e-06
polygon 39 (hole) 7 -1.40546e-01 -1.99e-10
polygon 40 (hole) 11 -8.36705e+01 -1.18e-07
polygon 41 (hole) 3 -2.33435e-03 -3.31e-12
polygon 42 45 2.51218e+03 3.56e-06
polygon 43 139 3.22293e+03 4.56e-06
polygon 44 148 3.10395e+03 4.40e-06
polygon 45 (hole) 4 -1.72650e-04 -2.44e-13
polygon 46 75 1.73526e+04 2.46e-05
polygon 47 83 5.28920e+03 7.49e-06
polygon 48 106 3.04104e+03 4.31e-06
polygon 49 264 1.50631e+06 2.13e-03
polygon 50 71 5.63061e+03 7.97e-06
polygon 51 10 1.99717e+02 2.83e-07
polygon 52 (hole) 3 -1.37223e-02 -1.94e-11
polygon 53 487 2.06117e+06 2.92e-03
polygon 54 65 8.42861e+04 1.19e-04
polygon 55 47 3.82087e+04 5.41e-05
polygon 56 22 6.74651e+03 9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
(53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46
6.0 Exploratory Spatial Data Analysis
6.1 Creating Point Symbol Maps
tmap_mode("view")
tm_shape(origin_sf) +
tm_dots(alpha=0.4,
size=0.05)
tm_shape(destination_sf) +
tm_dots(alpha=0.4,
size=0.05)6.2 Measuring Central Tendency
Descriptive statistics are used in point pattern analysis to summarise a point pattern’s basic properties, such as its central tendency and dispersion. The mean centre and the median centre are two often employed metrics for central tendency (Gimond, 2019).
6.2.1 Mean Center
Mean center is the arithmetic average of the (x, y) coordinates of all point in the study area. Similar to mean in statistical analysis, mean center is influenced to a greater degree by the outliers. (Yuan et al.,2020)
origin_xy <- st_coordinates(origin_sf)
origin_mc <- apply(origin_xy, 2, mean)
destination_xy <- st_coordinates(destination_sf)
destination_mc <- apply(destination_xy, 2, mean)
origin_mc X Y
28490.57 36939.04
destination_mc X Y
28870.96 36590.49
The results show that the origin and destination mean centres are, respectively, (28490.57, 36939.04) and (28870.96, 36590.49). The two mean centres appear to be situated in close proximity to one another.
6.2.2 Median Center
Median center is the location that minimizes the sum of distances required to travel to all points within an observation window. It can be calculated using an iterative procedure first presented by Kulin and Kuenne (1962). The procedure begins at a predetermined point, such as the median center, as the initial point. Then, the algorithm updates the median center’s new coordinates (x’, y’) continually until the optimal value is reached. The median center, as opposed to the mean center, offers a more reliable indicator of central tendency as it is unaffected by outliers (Yuan et al., 2020).
origin_medc <- apply(origin_xy, 2, median)
destination_medc <- apply(destination_xy, 2, median)
origin_medc X Y
28553.17 36179.05
destination_medc X Y
28855.04 35883.86
Based on the results, the median centres of origin and destination are, respectively, (28553.17, 36179.05) and (28855.04, 35883.86). The two median centres appear to be situated in close proximity to one another.
Moreover, mean centers and median centers for each origin and destination points are similar. This may imply that the distribution of the data is relatively balanced and there is not a significant difference in the spatial patterns between the origin and destination points. Additionally, this indicates that both the mean center and median center are effective measures for analyzing the central tendency of the data in this context.
6.2.3 Plotting Mean and Median Centers
We can try to plot both results obtained from previous section on the same plane for better comparison of the mean center and median center.
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey', main="mean and median centers of origin_sf")
points(origin_xy, cex=.5)
points(cbind(origin_mc[1], origin_mc[2]), pch='*', col='red', cex=3)
points(cbind(origin_medc[1], origin_medc[2]), pch='*', col='purple', cex=3)
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey', main="mean and median centers of destination_sf")
points(destination_xy, cex=.5)
points(cbind(destination_mc[1], destination_mc[2]), pch='*', col='yellow', cex=3)
points(cbind(destination_medc[1], destination_medc[2]), pch='*', col='orange', cex=3)
6.2 Measuring Dispersion
6.2.1 Standard Distance
Standard distances are defined similarly to standard deviations. This indicator measures how dispersed a group of points is around its mean center (Gimond, 2023).
origin_sd <- sqrt(sum((origin_xy[,1] - origin_mc[1])^2 + (origin_xy[,2] - origin_mc[2])^2) / nrow(origin_xy))
destination_sd <- sqrt(sum((destination_xy[,1] - destination_mc[1])^2 + (destination_xy[,2] - destination_mc[2])^2) / nrow(destination_xy))
origin_sd[1] 10187.88
destination_sd[1] 9545.69
From the results, the origin and destination standard distances are 10187.88 and 9545.69, respectively. Hence, it appears that origin points are more dispersed than the origin points.
However, it would be challenging to discern why the origin points are more dispersed without further analysis. Further analysis would be needed to determine the factors contributing to the increased dispersion of destination points. Since it is out of scope for this exercise, we will not explore any further.
6.2.3 Plotting Standard Distances
In this section, we will create bearing circles of origin and destination points using the standard distance values we have calculated earlier. This can provide visual representation of their dispersion and make intuitive comparison between them.
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey', main="standard distance of origin_sf")
points(origin_xy, cex=.5)
points(cbind(origin_mc[1], origin_mc[2]), pch='*', col='red', cex=3)
bearing <- 1:360 * pi/180
cx <- origin_mc[1] + origin_sd * cos(bearing)
cy <- origin_mc[2] + origin_sd * sin(bearing)
circle <- cbind(cx, cy)
lines(circle, col='red', lwd=2)
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey',main="standard distance of destination_sf")
points(destination_xy, cex=.5)
points(cbind(destination_mc[1], destination_mc[2]), pch='*', col='purple', cex=3)
bearing <- 1:360 * pi/180
cx <- destination_mc[1] + destination_sd * cos(bearing)
cy <- destination_mc[2] + destination_sd * sin(bearing)
circle <- cbind(cx, cy)
lines(circle, col='purple', lwd=2)
A better comparison of the standard distances between origin and destination points can also be achieved by trying to plot both results on the same plane.
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey',main="standard distances of origin_sf & destination_sf")
points(cbind(origin_mc[1], origin_mc[2]), pch='*', col='red', cex=3)
points(cbind(destination_mc[1], destination_mc[2]), pch='*', col='purple', cex=3)
bearing <- 1:360 * pi/180
origin_cx <- origin_mc[1] + origin_sd * cos(bearing)
origin_cy <- origin_mc[2] + origin_sd * sin(bearing)
destination_cx <- destination_mc[1] + destination_sd * cos(bearing)
destination_cy <- destination_mc[2] + destination_sd * sin(bearing)
origin_circle <- cbind(origin_cx, origin_cy)
destination_circle <- cbind(destination_cx, destination_cy)
lines(origin_circle, col='red', lwd=2)
lines(destination_circle, col='purple', lwd=2)
6.3 Spatial Randomness Test
Clark and Evans (1954) give a very simple test of spatial randomness called Clark and Evans aggregation index (R). It is the ratio of the observed mean nearest neighbour distance in the pattern to that expected for a Poisson point process of the same intensity. A value R>1 suggests ordering, while R<1 suggests clustering.
we will perform the Clark-Evans test of aggregation for a spatial point pattern by using clarkevans.test() of statspat.
The test hypotheses are:
H0 = The distribution of trajectory original points are randomly distributed.
H1= The distribution of trajectory original points are not randomly distributed.
The 95% confidence interval will be used.
clarkevans.test(origin_ppp_sg,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: origin_ppp_sg
R = 0.27408, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
The test hypotheses are:
H0 = The distribution of trajectory destination points are randomly distributed.
H1= The distribution of trajectory destination points are not randomly distributed.
The 95% confidence interval will be used.
clarkevans.test(destination_ppp_sg,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: destination_ppp_sg
R = 0.29484, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
7.0 First-Order Spatial Point Patterns Analysis
After data wrangling is complete, we will start to perform first-order spatial point pattern analysis using functions from spatstat package. As we have discussed in Section 2.0., First-order properties concern the characteristics of individual point locations and their variations of their density across space and are mostly addressed by density-based techniques, such as quadrant analysis and kernel density estimation.
Investigation of the intensity of a point pattern is one of the first and most important steps in point pattern analysis (Baddeley et al., 2015). If the point process has an intensity function λ(u), this function can be estimated non-parametrically by kernel estimation (Baddeley et al., 2015). Kernel estimation allows for smoothing of the probability density estimation of a random variable (in this analysis a point event) based on kernels as weights.
7.1 Rescaling origin_ppp_sg and destination_ppp_sg
The SVY21 Coordinate References System uses meters as the standard unit. Hence, the original_ppp_sg and destination_ppp_sg that we have prepared in the previous sections has “metres” as the unit. However, we will need to convert the measuring unit from metre to kilometeres when calculating the kernel density estimators for entirety of Singapore because kilometers provide a more appropriate scale for analyzing large areas.
origin_ppp_sg.km <- rescale(origin_ppp_sg, 1000, "km")
destination_ppp_sg.km <- rescale(destination_ppp_sg, 1000, "km")7.2 Computing Default Kernel Density Estimation
Kernel Destiny Estimation (KDE) generates a surface (raster) representing the estimated distribution of point events over the observation window. Each cell in the KDE layer carries a value representing the estimated density of that location (Wilkin, 2020). Hence, this approach is also known as local density approach. To build the KDE layer, a localised density is calculated for multiple small subsets of the observation window. However, these subsets overlap throughout each iteration, resulting in a moving window defined by a kernel (Wilkin, 2020; Gimond, 2023).
In this section, we will focus on destination points as we would like to identify areas that exert a ‘pull’ effect on people, hence resulting in cluster of trajectory destinations. Analyzing the destination of Grab trajectories can provide interesting insights into pull factors within a given area and help identify popular destinations or areas of high mobility demand.
Kernel estimation is implemented in spatstat by the function density.ppp(), a method for the generic command density.
par(mar = c(0,1,1,1))
kde_default_destination <- density(destination_ppp_sg.km)
plot(kde_default_destination,main = "Default Density KDE for Destination Points")
contour(kde_default_destination, add=TRUE)
sigma argument in density() function controls the bandwidth of kernel function. The choice of the bandwidth affects the kernel density estimation strongly. A smaller bandwidth will produce a finer density estimate with all little peaks and valleys. A larger bandwidth will result into a smoother distribution of point densities. Generally speaking, if the bandwidth is too small the estimate is too noisy, while if bandwidth is too high the estimate may miss crucial elements of the point pattern due to over-smoothing (Scott, 2009).

When sigma value is not specified, an isotropic Gaussian kernel will be used, with a default value of sigma calculated by a simple rule of thumb that depends only on the size of the window. Hence, the KDE given by default argument may not be what we aim to achieve. Looking at the KDE plot we have created above, there are signs of oversmoothing where only a single spatial cluster in the CBD area being observable. This can severely limit our analysis as potential small-scale clusters and other interesting details are being masked by the oversmoothing effect.
To overcome this challenge, we can specify smoothing bandwidth through the argument sigma or kernel function through the argument kernel to compute and plot more intuitive and detailed KDE maps.
7.3 Creating KDE Layers with Fixed Bandwidth
7.3.1 Computing Fixed Bandwidths Using Different Bandwidth Selection Methods
density() function of spatstat allows us to compute a kernel density for a given set of point events.
bw.diggle()Cross Validated Bandwidth Selection for Kernel Density: In this method, band-width σ is chosen to minimize the mean-square error criterion defined by Diggle (1985). The mean-square error is a measure of the average of the squares of the errors - that is, the average squared difference between the estimated values and the actual value.bw.CvL()Cronie and van Lieshout’s Criterion for Bandwidth Selection for Kernel Density: The bandwidth σ is chosen to minimize the discrepancy between the area of the observation window and the sum of reciprocal estimated intensity values at the points of the point process. This method aims to choose a bandwidth that best represents the underlying point process, taking into account both the observed points and the area they occupy.bw.scott()Scott’s Rule for Bandwidth Selection for Kernel Density: The bandwidth σ is computed by the rule of thumb of Scott (1992). The bandwidth is proportional to \(n^{-1/(d-4)}\) where n is the number of points and d is the number of spatial dimensions. This rule is very fast to compute. It typically produces a larger bandwidth than Diggle’s method. It is useful for estimating gradual trend.bw.ppl()Likelihood Cross Validation Bandwidth Selection for Kernel Density: This approach, explained by Loader (1999), uses likelihood cross-validation to determine the bandwidth (σ) by maximizing the point process likelihood. This method is beneficial when the aim is to maximize the likelihood of observing the given data.
bw_diggle <- bw.diggle(destination_ppp_sg.km)
bw_diggle sigma
0.008317447
bw_CvL <- bw.CvL(destination_ppp_sg.km)
bw_CvL sigma
3.745658
bw_scott <- bw.scott(destination_ppp_sg.km)
bw_scott sigma.x sigma.y
1.4763217 0.9063352
bw_ppl <- bw.ppl(destination_ppp_sg.km)
bw_ppl sigma
0.1913655
We notice that bw_diggle, bw_CvL and bw_ppl all give a numeric sigma value, whereas bw_scott, by default, provides a separate bandwidth for each coordinate axis. In the code output above, sigma.x = 1.4763217 and sigma.y = 0.9063352 are the estimated bandwidths for the x and y coordinates, respectively. These values represent the amount of smoothing applied in each direction when estimating the kernel density.
We can specify isotropic=TRUE argument when calculating bw_scott() method to produce a single value bandwidth.
bw_scott_single <- bw.scott(destination_ppp_sg.km, isotropic=TRUE)
bw_scott_single sigma
1.156738
The optimized bandwidth values generated from above methods belongs to the special class bw.optim. The plot function can be used to see the objective function for the optimisation that leads to the result.
par(mfrow = c(1,2))
plot(bw_diggle, xlim=c(-0.02,0.05), ylim=c(-60,200))
plot(bw_CvL)
par(mfrow = c(1,2))
plot(bw_scott)
plot(bw_ppl, xlim=c(-1,5), ylim=c(70000,130000))
7.3.2 Plotting Fixed-Bandwidth KDE Layers
In practice, there are no definite method to choose the KDE bandwidth. Many literature has outlined a diverse range of approaches for KDE bandwidth selection. According to Wolff and Asche (2009), the choice of bandwidth in many existing studies is mostly conducted by visually comparing different bandwidth setting.
Hence, we will now create KDE layers based on each bandwidth selection method and visualize them to have a better comparison of how distinct the resulting KDE layers are.
kde_diggle <- density(destination_ppp_sg.km, bw_diggle)
kde_CvL <- density(destination_ppp_sg.km, bw_CvL)
kde_scott <- density(destination_ppp_sg.km, bw_scott)
kde_ppl <- density(destination_ppp_sg.km, bw_ppl)
par(mar = c(1,1,1,1.5),mfrow = c(2,2))
plot(kde_diggle,main = "kde_diggle")
plot(kde_CvL,main = "kde_CvL")
plot(kde_scott,main = "kde_scott")
plot(kde_ppl,main = "kde_ppl")
Next, we will try to plot histograms to compare the distribution of KDE values obtained from density() function using different bandwidth selection methods.
par(mar = c(2,2,2,2),mfrow = c(2,2))
hist(kde_diggle,main = "kde_diggle")
hist(kde_CvL,main = "kde_CvL")
hist(kde_scott,main = "kde_scott")
hist(kde_ppl,main = "kde_ppl")
We can interpret the outputs as below:
kde_diggle: The sharp peak at the beginning indicates that the Diggle method for bandwidth selection has indentified a high concentration of points in the first bin. The rest of the bins has little to no concentration. This may suggest that one specific area in our observation window has observed a relatively high spatial clustering than the rest of the window.
kde_CvL: The more balanced distribution suggests that the CvL method for bandwidth selection is identifying a broader range of spatial point concentration. However, the bin sizes are quite small, which smooths out the overall distribution and masks some of the finer details.
kde_scott: The wider range of values and less sharp peak compared to kde_diggle indicate that the Scott method is capturing a wider range of spatial point concentrations, including both densely concentrated locations and moderately concentrated ones.
kde_ppl: The result is very similar to the Diggle method, the sharp peak at the beginning suggests a high concentration of points in a specific area, suggest that one specific area in our observation window has observed a relatively high spatial clustering than the rest of the area.
Another apporach to compare the KDE layers is to calculate the standard error of each density estimation. $SE is used to extract the standard error of the density estimate from the output of the density() function
dse_diggle <- density(destination_ppp_sg.km, bw_diggle, se=TRUE)$SE
dse_CvL <- density(destination_ppp_sg.km, bw_CvL, se=TRUE)$SE
dse_scott <- density(destination_ppp_sg.km, bw_scott, se=TRUE)$SE
dse_ppl <- density(destination_ppp_sg.km, bw_ppl, se=TRUE)$SEpar(mar = c(1,1,1,1.5),mfrow = c(2,2))
plot(dse_diggle,main = "standard error_diggle")
plot(dse_CvL,main = "standard error_CvL")
plot(dse_scott,main = "standard error_scott")
plot(dse_ppl,main = "standard error_ppl")
The standard error (SE) of the density estimate provides a measure of the uncertainty associated with the density estimate at each point. This can be useful for understanding the variability of density estimates, especially when comparing density estimates obtained using different bandwidths.
However, in many applications of KDE, the focus is often on the shape of the density estimate rather than its absolute value. In such cases, the standard error might not be as relevant. Hence, in this analysis, we will not use standard error as a criteria for choosing the bandwidth.
7.3.3 Choosing Fixed KDE Bandwidth Selection Method
Upon the exploration of various fixed bandwidth selection methods for computing KDE vales, and subsequent plotting of the respective KDE estimates, their distributions and associated standard errors, we will now select the KDE bandwidth to be used in our analysis. As we have seen in Section 7.3.1.2, each KDE bandwidth method has produced a distinct KDE and there is no definite method to choose the KDE bandwidth.
We will proceed to choose bw_scott method for further analysis. This is because:
bw_scottmethod provides a pair of bandwidth values for each coordinate axis. This allows it to capture the different levels of spatial clustering in each direction more accurately.bw_scottmethod capture the balance between bias and variance the best among all methods. If the bandwidth is too small, the estimate may be too skewed (high variance). The distribution histograms of KDE layers usingbw_diggleandbw_ppltend to indicate such nature. On the other hand, if the bandwidth is too large, the estimate may be oversmoothed, missing crucial elements of the point pattern (high bias). This is what we observed in the distribution histogram of KDE layer usingbw_CvL.
Since we have chosen to use bw_scott method, now we will plot the KDE layer using this method for further analysis.
par(mar = c(0,1,1,1))
bw_fixed_scott <- bw.scott(destination_ppp_sg.km)
bw_fixed_scott sigma.x sigma.y
1.4763217 0.9063352
kde_fixed_scott <- density(destination_ppp_sg.km, bw_fixed_scott)
plot(kde_fixed_scott,main = "Fixed-Bandwidth KDE for Grab Destination Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)
However, upon visual inspection, there are signs of a certian degree of over-smoothing when we directly use the bandwidth provide by bw_scott method. Automatic bandwidth selection methods provides a starting point for bandwidth selection, and further fine-tuning might be necessary based on the results of the plot we have created above.
To do so, we will use rule of thumb adjustment by diving the bandwidth value by 2, to reduce the bandwidth size, and hence possible over-smoothing effect.
par(mar = c(0,1,1,1))
kde_fixed_scott <- density(destination_ppp_sg.km, sigma=bw_fixed_scott/2)
plot(kde_fixed_scott,main = "Fixed-Bandwidth KDE for Grab Destination Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)
Looking at the plot created, it appears that by reducing the bandwidth (thus making the point cluster buffers smaller), the over-smoothing effects have been minimized. However, we still can observe the Grab destination hotspot areas, which we can investigate further in subsequent sections.
7.3.4 Kernel Function Selection for Fixed Bandwidth

Kernel functions are another consideration in KDE computation because they control how we weight points within the bandwidth radius. The default kernel in density.ppp() is the gaussian. But there are other options such as epanechnikov, quartic or disc.
gaussian
epanechnikov
quartic
disc
kde_fixed_scott.gaussian <- density(destination_ppp_sg.km,
sigma=bw_fixed_scott,
edge=TRUE,
kernel="gaussian")
kde_fixed_scott.epanechnikov <- density(destination_ppp_sg.km,
sigma=bw_fixed_scott,
edge=TRUE,
kernel="epanechnikov")
kde_fixed_scott.quartic <- density(destination_ppp_sg.km,
sigma=bw_fixed_scott,
edge=TRUE,
kernel="quartic")
kde_fixed_scott.disc <- density(destination_ppp_sg.km,
sigma=bw_fixed_scott,
edge=TRUE,
kernel="disc")
par(mar = c(1,1,1,1.5),mfrow = c(2,2))
plot(kde_fixed_scott.gaussian, main="Gaussian")
plot(kde_fixed_scott.epanechnikov, main="Epanechnikov")
plot(kde_fixed_scott.quartic, main="Quartic")
plot(kde_fixed_scott.disc, main="Disc")
Xie & Yan (2008) suggested that the choice of kernel function has minimal impact on density estimation outcomes. Shen et al. (2020) further identified the search bandwidth as a more influential factor in kernel estimation than the selection of different kernel functions. Empirically looking at the KDE maps we have created above using different kernel functions, similar conclusion can be made. Despite the slight variations in smoothness and spread, all four plots show similar patterns of density estimation. This underscores the argument that the choice of kernel function does not significantly impact KDE results. Consequently, we will not focus this aspect in our analysis.
7.4 Creating KDE Layers with Spatially Adaptive Bandwidth
Fixed bandwidth kernels are commonly employed in statistical literature due to their ease of implementation. However, their application in spatial datasets often yields suboptimal estimations due to a lack of spatial and temporal adaptability (González & Moraga, 2022). Consequently, the more intuitive ‘adaptive smoothing’ approach has emerged. In this technique, the amount of smoothing is inversely related to the density of the points.
In spatstat packages, there are three main approaches (Pebesma & Bivand, 2023) in creating KDE layers with spatially adapative bandwidth.
Voronoi-Dirichlet Adaptive Density Estimate: It computes the intensity function estimate of a point pattern dataset by creating tessellations. On default, the input point pattern data is used to construct a Voronoi/Dirichlet tessellation (Barr and Schoenberg, 2010). The intensity estimate at a given location equals the reciprocal of the size of the Voronoi/Dirichlet cell containing that location.
Adaptive Kernel Density Estimate: It computes an estimate of the intensity function of a point pattern dataset using the partitioning technique of Davies and Baddeley (2018). It dynamically specifies the smoothing bandwidth to be applied to each of the points. The partitioning method of Davies and Baddeley (2018) accelerates this computation by partitioning the range of bandwidths into n-groups intervals, correspondingly subdividing the point patterns into n-groups, sub-patterns according to bandwidth, and applying fixed-bandwidth smoothing to each sub-pattern.
Nearest-Neighbour Adaptive Density Estimate: It computes an estimate of the intensity function of a point pattern dataset using the distance from each spatial location to the kth nearest points (Cressie, 1991: Silverman, 1986;Burman & Nolan, 1989). The default value of k is the square root of the number of points in the dataset. This estimator of intensity is relatively fast to compute and is spatially adaptive. Some studies suggest the use of the nearest neighbor distance as a suitable parameter for determining the bandwidth (Krisp et al., 2009).
7.4.1 Voronoi-Dirichlet Adaptive Density Estimate
The Dirichlet-Voronoï estimator is computed in spatstat by the function adaptive.density() with argument method="voronoi".
kde_destination_dirichlet_adaptive <- adaptive.density(destination_ppp_sg.km, f=1, method = "voronoi")par(mar = c(0,1,1,1))
plot(kde_destination_dirichlet_adaptive,main = "Voronoi-Dirichlet Adaptive Density Estimate")
7.4.2 Adaptive Kernel Density Estimate
The Adaptive Kernel estimator is computed in spatstat by the function adaptive.density() with argument method="kernel".
kde_destination_kernel_adaptive <- adaptive.density(destination_ppp_sg.km, method = "kernel")par(mar = c(0,1,1,1))
plot(kde_destination_kernel_adaptive,main = "Adaptive Kernel Density Estimate")
7.4.3 Nearest-Neighbour Density Estimate
The Nearest-Neighbour estimator is computed in spatstat by the function nndensity().
kde_adaptive_nn <- nndensity(destination_ppp_sg.km, k=10)par(mar = c(0,1,1,1))
plot(kde_adaptive_nn,main = "Nearest-Neighbour Adaptive Density Estimate")
7.4.3 Choosing Adaptive KDE Method
Similar to what we did for fixed bandwidth, we can try to plot histograms to compare the distribution of KDE values obtained from density() function using different adaptive bandwidth selection methods.
par(mar = c(2,2,2,2),mfrow = c(2,2))
hist(kde_destination_dirichlet_adaptive,main = "Voronoi-Dirichlet Adaptive")
hist(kde_destination_kernel_adaptive,main = "Adaptive Kernel")
hist(kde_adaptive_nn,main = "Nearest-Neighbour Adaptive")
From the outputs, it seems that there is no significance difference between the distribution of KDE values obtained across different methods. All three methods identified a high concentration of points in a specific area. Hence, we will choose to go with Adapative Kernel method because it provides the most
7.5 Plotting Interactive KDE Maps
raster_kde_fixed_scott <- raster(kde_fixed_scott)
raster_kde_adaptive_nn <- raster(kde_adaptive_nn)
raster_kde_adaptive_kernel <- raster(kde_destination_kernel_adaptive)projection(raster_kde_fixed_scott) <- CRS("+init=EPSG:3414 +units=km")
projection(raster_kde_adaptive_nn) <- CRS("+init=EPSG:3414 +units=km")
projection(raster_kde_adaptive_kernel) <- CRS("+init=EPSG:3414 +units=km")tmap_mode('view')
kde_fixed_scott <- tm_basemap(server = "OpenStreetMap.HOT") +
tm_basemap(server = "Esri.WorldImagery") +
tm_shape(raster_kde_fixed_scott) +
tm_raster("layer",
n = 10,
title = "KDE_Fixed_scott",
alpha = 0.6,
palette = c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e")) +
tm_shape(mpsz_sf)+
tm_polygons(alpha=0.1,id="PLN_AREA_N")+
tmap_options(check.and.fix = TRUE)
tmap_mode('view')
kde_adaptive_nn <- tm_basemap(server = "OpenStreetMap.HOT") +
tm_basemap(server = "Esri.WorldImagery") +
tm_shape(raster_kde_adaptive_nn) +
tm_raster("layer",
n = 7,
title = "KDE_Adaptive_nn",
style = "pretty",
alpha = 0.6,
palette = c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e")) +
tm_shape(mpsz_sf)+
tm_polygons(alpha=0.1,id="PLN_AREA_N")+
tmap_options(check.and.fix = TRUE)
tmap_mode('view')
kde_adaptive_kernel <- tm_basemap(server = "OpenStreetMap.HOT") +
tm_basemap(server = "Esri.WorldImagery") +
tm_shape(raster_kde_adaptive_kernel) +
tm_raster("layer",
n = 7,
title = "KDE_Adaptive_Kernel",
style = "pretty",
alpha = 0.6,
palette = c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e")) +
tm_shape(mpsz_sf)+
tm_polygons(alpha=0.1,id="PLN_AREA_N")+
tmap_options(check.and.fix = TRUE)
tmap_arrange(kde_fixed_scott, kde_adaptive_nn, kde_adaptive_kernel ,ncol=1,nrow=3,sync = TRUE)7.6 Analysis of Singapore-Level Kernel Density Estimation Maps
FromThe highest cluster of Grab taxi drop-off points are seen near Changi Airport, where the concentrations reach up to 450.
The central and southern parts of Singapore, particularly the Central Business District (CBD) and Marina Bay areas, also exhibit substantial concentrations of Grab taxi drop-off points, peaking at 350. These areas have a high demand for taxi services, possibly due to a high concentration of businesses, tourist attractions.
Interestingly enough, we found a few residential subzones outside of the CBD region, which is comparatively higher density values compared to the rest of the island. However, because of the smoothing effect, identifying these hotspots using a fixed-bandwidth KDE map is difficult. Hence, we proceeded to perform a cross inspection with adaptive KDE maps for better precision in hotspots identification.
Through adaptive nearest neighbour KDE map, we identified distinct hotspots in Jurong West (7000-8000), Woodlands (6000-7000), Tampines (4000-5000), and Toa Payoh (4000-5000). The Adaptive Kernel KDE map complements this finding by accentuating two additional hotspots, Jurong East (500-1000) and Punggol (500-1000).

It is intriguing to see that the six residential subzones we have identified have higher concentrations of drop-off points, despite being predominantly classified as residential areas , according to the Master Plan 2019 (MP19) by Urban Redevelopment Authority Singapore. It will be interesting to discern underlying spatial process and pull factors that lead to this spatial cluster. Therefore, we will attempt to create comparable KDE maps at the planning subzone level.
7.7 Planning Area-Level Kernel Density Estimation
In this section, we will create planning-area level KDE maps for six planning areas we have identified. In order to create such maps, we will carry out additional data wrangling as required.
Firstly, we will filter out different planning areas as separate sf objects from mpsz_sf.
wl = mpsz_sf %>% filter(PLN_AREA_N == "WOODLANDS")
je = mpsz_sf %>% filter(PLN_AREA_N == "JURONG EAST")
jw = mpsz_sf %>% filter(PLN_AREA_N == "JURONG WEST")
tn = mpsz_sf %>% filter(PLN_AREA_N == "TAMPINES")
tp = mpsz_sf %>% filter(PLN_AREA_N == "TOA PAYOH")
pg = mpsz_sf %>% filter(PLN_AREA_N == "PUNGGOL")
par(mar = c(1,1,1,0),mfrow=c(2,3))
plot(st_geometry(wl), main = "Woodlands")
plot(st_geometry(je), main = "Jurong East")
plot(st_geometry(jw), main = "Jurong West")
plot(st_geometry(tn), main = "Tampines")
plot(st_geometry(tp), main = "Toa Payoh")
plot(st_geometry(pg), main = "Punggol")
Next, we will create owin objects to represent the observation windows for respective planning area. Once owin objects are created, we will also filter Grab taxi drop-off points in each observation window from the original destination_ppp_sg ppp object.
wl_owin = as.owin(wl)
je_owin = as.owin(je)
jw_owin = as.owin(jw)
tn_owin = as.owin(tn)
tp_owin = as.owin(tp)
pg_owin = as.owin(pg)
destination_wl_ppp = destination_ppp_sg[wl_owin]
destination_je_ppp = destination_ppp_sg[je_owin]
destination_jw_ppp = destination_ppp_sg[jw_owin]
destination_tn_ppp = destination_ppp_sg[tn_owin]
destination_tp_ppp = destination_ppp_sg[tp_owin]
destination_pg_ppp = destination_ppp_sg[pg_owin]Now that we have prepared both owin and ppp objects for each planning area, we are ready to plot KDE maps. Similar to what we have done in previous section, we will try both fixed-bandwitdh and adaptive bandwidth KDE maps.
7.7.1 Planning Area-Level Fixed-Bandwidth KDE Maps
wl_kde_scott <- density(destination_wl_ppp, sigma=bw.scott, main="Woodlands")
je_kde_scott <- density(destination_je_ppp, sigma=bw.scott, main="Jurong East")
jw_kde_scott <- density(destination_jw_ppp, sigma=bw.scott, main="Jurong West")
tn_kde_scott <- density(destination_tn_ppp, sigma=bw.scott, main="Tampines")
tp_kde_scott <- density(destination_tp_ppp, sigma=bw.scott, main="Toa Payoh")
pg_kde_scott <- density(destination_pg_ppp, sigma=bw.scott, main="Punggol")
par(mar = c(1,1,1,1.5),mfrow = c(3,2))
plot(wl_kde_scott,main = "Fixed KDE Woodlands")
contour(wl_kde_scott, add=TRUE)
plot(je_kde_scott,main = "Fixed KDE Jurong East")
contour(je_kde_scott, add=TRUE)
plot(jw_kde_scott,main = "Fixed KDE Jurong West")
contour(jw_kde_scott, add=TRUE)
plot(tn_kde_scott,main = "Fixed KDE Tampines")
contour(tn_kde_scott, add=TRUE)
plot(tp_kde_scott,main = "Fixed KDE Toa Payoh")
contour(tp_kde_scott, add=TRUE)
plot(pg_kde_scott,main = "Fixed KDE Punggol")
contour(pg_kde_scott, add=TRUE)
7.7.2 Planning Area-Level Adaptive-Bandwidth KDE Maps
wl_kde_adaptive_kernel <- adaptive.density(destination_wl_ppp, method = "kernel")
je_kde_adaptive_kernel <- adaptive.density(destination_je_ppp, method = "kernel")
jw_kde_adaptive_kernel <- adaptive.density(destination_jw_ppp, method = "kernel")
tn_kde_adaptive_kernel <- adaptive.density(destination_tn_ppp, method = "kernel")
tp_kde_adaptive_kernel <- adaptive.density(destination_tp_ppp, method = "kernel")
pg_kde_adaptive_kernel <- adaptive.density(destination_pg_ppp, method = "kernel")
par(mar = c(1,1,1,1.5),mfrow = c(3,2))
plot(je_kde_adaptive_kernel,main = "Adaptive KDE Woodlands")
contour(je_kde_adaptive_kernel, add=TRUE)
plot(je_kde_adaptive_kernel,main = "Adaptive KDE Jurong East")
contour(je_kde_adaptive_kernel, add=TRUE)
plot(jw_kde_adaptive_kernel,main = "Adaptive KDE Jurong West")
contour(jw_kde_adaptive_kernel, add=TRUE)
plot(tn_kde_adaptive_kernel,main = "Adaptive KDE Tampines")
contour(tn_kde_adaptive_kernel, add=TRUE)
plot(tp_kde_adaptive_kernel,main = "Adaptive KDE Toa Payoh")
contour(tp_kde_adaptive_kernel, add=TRUE)
plot(pg_kde_adaptive_kernel,main = "Adaptive KDE Punggol")
contour(pg_kde_adaptive_kernel, add=TRUE)
7.8 Analysis of Planning Area-Level Kernel Density Estimation Maps
Observations from the planning area-level KDE maps reveal that while KDE maps excel in visualizing and analyzing spatial data, their application to micro-level analysis, like planning subzones, has limitations. Firstly, there are issues with over-smoothing (in fixed KDE maps) and undersmoothing (in adaptative KDE maps) which deters us from discerning meaningful insights. Furthermore, KDE values, generated on grid-pixels using Euclidean distance, result in grid blocks that limit our ability to discern fine details and variations within each subzone.
For example, let’s look at the picture below, which is a snapshot from KDE adaptive nearest-neighbor map. A hotspot is identified and represented by a concentrated red zone near Braddell Road in Toa Payoh. However, just by looking at this map, it's really hard to understand more about this hotspot. We know there is a cluster of drop-off points, but why are they there? What are pull factors attracting people to this area? It's almost impossible to answer these questions just by looking at this KDE map.

This limitation underscores the need for more advanced visualization techniques or analytical methods that can provide alternative approaches to understanding spatial point clusters. With this limitation in mind, we will apply a new analytical method called network constrained kernel density estimation (NKDE) to offer more detailed insights into the spatial distribution of Grab taxi drop-off points through the integration of road networks.
8.0 Network Constrained Kernel Density Estimation (NKDE)
In the real world, point events are often not randomly distributed. Instead, their distribution is constrained by networks. When carrying out spatial point pattern analysis, traditional Kernel Density Estimation (KDE) assumes an infinite, homogeneous, two-dimensional space, an approximation that is not accurate for network-based study areas. In such networks, movement is constrained by multiple one-dimensional lines (Gelb, 2021). Network Constrained Kernel Density Estimation (NKDE), therefore, is a widely used approach to identify the hotspots and evaluate origin-destination points along with the road network (Shen et al. 2020).
This approach estimates the intensity of the spatial process solely on the network. The network edges are divided into lixels (one-dimensional pixels), and the centers of the lixels serve as the locations for intensity estimation. Distances between events and sampling points are calculated as the shortest path distances on the network, instead of Euclidean distances. This adjustment slightly modifies the intensity function from the classical KDE function. The adapted formula makes the interpretation straightforward: it “estimates the density over a linear unit” rather than an area unit.
In this section, we will follow-up on the six planning areas we identified in Section 7.6 and attempt to create NKDE maps using spNetwork package.
8.1 Extracting Road Networks for Focus Planning Areas
Before we carry out the anlysis, we will extract relevant datasets required for calculating NKDE values in each planning area:
road network
destination points
wl_network = st_intersection(sg_driving_sf,st_union(wl))
je_network = st_intersection(sg_driving_sf,st_union(je))
jw_network = st_intersection(sg_driving_sf,st_union(jw))
tn_network = st_intersection(sg_driving_sf,st_union(tn))
tp_network = st_intersection(sg_driving_sf,st_union(tp))
pg_network = st_intersection(sg_driving_sf,st_union(pg))wl_destination = st_intersection(destination_sf,st_union(wl))
je_destination = st_intersection(destination_sf,st_union(je))
jw_destination = st_intersection(destination_sf,st_union(jw))
tn_destination = st_intersection(destination_sf,st_union(tn))
tp_destination = st_intersection(destination_sf,st_union(tp))
pg_destination = st_intersection(destination_sf,st_union(pg))8.3 Data Preparation
For illustrative purpose, I’ll use “Punggol” as a example to demonstrate step-by-step data preparation process. Upon completion of this demonstration, maps for the remaining planning areas will be created.
The spNetwork package contains nkde function that is specifically designed to implement Network Constrained Kernel Density Estimation (NetKDE).
The three main inputs are:
lixels, a “SpatialLinesDataFrame”, representing the lines of the network.
events, a “SpatialPointsDataframe”, representing the realizations of the spatial process.
samples, a “SpatialPointsDataframe” providing the locations where the density must be estimated.
8.3.1 Preparing the lixels objects
Before computing NetKDE, the SpatialLines object need to be cut into lixels with a specified minimal distance. This task can be performed by using with lixelize_lines() of spNetwork.
pg_lixels <- lixelize_lines(pg_network,
700,
mindist = 350)In this code snippet, the length of a lixel is set to 700m, and the minimum length of a lixel, denoted as mindist, is set to 350m. This implies that during segmentation, if the length of the final lixel is shorter than the minimum distance, then it is added to the previous lixel. The segments that are already shorter than the minimum distance are not modified.
8.3.2 Generating line centre points
Next, lines_center() of spNetwork will be used to generate a SpatialPointsDataFrame (i.e. samples) with line centre points. The points are located at center of the line based on the length of the line.
pg_samples <- lines_center(pg_lixels)8.4 Performing NKDE
8.4.1 Computing Network Constrained Kernel Density Estimation
The spNetwork package offers three methods for calculating NKDE values simple, discontinuous and continuous.
Simple NKDE: Simple NKDE method was proposed by Xie and Yan (2008), extending the planar KDE to a network-based model. However, there are issues with this method due to statistical incorrectness. Specifically, at network intersections, the event’s mass is multiplied in each direction, leading to overestimation and inflated density estimates, which can result in misleading interpretations (Gelb, 2021). To address these issues, Okabe et al. (2009) proposed two unbiased estimators:Discontinuous NKDEandContinuous NKDE.Discontinuous NKDE: Discontinuous NKDE aims to resolve density inflation by dividing the mass at intersections according to the number of directions minus one, resulting in an unbiased estimator (Gelb, 2021). However, the discontinuous nature of this method can be counter-intuitive in practical applications.Continuous NKDE: Continuous NKDE proposes to address the limitations of both simple and discontinuous NKDE. It adjusts the values of the NKDE at intersections and applies a backward correction to force the density values to be continuous (Gelb, 2021).
In this analysis, we will compute NKDE values using all three methods and compare the results.
pg_nkde_simple <- nkde(pg_network,
events = pg_destination,
w = rep(1,nrow(pg_destination)),
samples = pg_samples,
kernel_name = "quartic",
bw = 200,
div= "bw",
method = "simple",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE)pg_nkde_discontinuous <- nkde(pg_network,
events = pg_destination,
w = rep(1,nrow(pg_destination)),
samples = pg_samples,
kernel_name = "quartic",
bw = 200,
div= "bw",
method = "discontinuous",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE)pg_nkde_continuous <- nkde(pg_network,
events = pg_destination,
w = rep(1,nrow(pg_destination)),
samples = pg_samples,
kernel_name = "quartic",
bw = 200,
div= "bw",
method = "continuous",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE)bw argument refers to the bandwidth used for calculating KDE values. Xie and Yan (2008) suggested that narrow bandwidths (between 20 and 250 m) are more appropriate for identifying local effects at smaller scales. Hence, we use 200 here.
agg argument allows the events to be aggregated, and their weights to be added within a threshold distance. Aggregating events can simplify networks and limit the number of iterations when calculating the NKDE, hence effectively reducing time complexity of computation.
8.4.1 Visualising NKDE Maps Using Different Methods
Before we can visualise the NetKDE values, we will insert the computed density values into samples and lixels objects. To enhance the readability of the results, you will first multiply the obtained densities by the total number of drop-off points. This adjustment ensures that the spatial integral equals the number of events. Subsequently, you will multiply this value by 1000 to derive the estimated number of drop-off points per kilometer. This process will provide a more intuitive understanding of the density distribution along the network.
pg_samples$nkde_simple <- pg_nkde_simple*nrow(pg_destination)*1000
pg_lixels$nkde_simple <- pg_nkde_simple*nrow(pg_destination)*1000
pg_samples$nkde_discontinuous <- pg_nkde_discontinuous*nrow(pg_destination)*1000
pg_lixels$nkde_discontinuous <- pg_nkde_discontinuous*nrow(pg_destination)*1000
pg_samples$nkde_continuous <- pg_nkde_continuous*nrow(pg_destination)*1000
pg_lixels$nkde_continuous <- pg_nkde_continuous*nrow(pg_destination)*1000Reading layer `singapore_pois' from data source
`/Users/khantminnaing/IS415-GAA/data/geospatial' using driver `ESRI Shapefile'
Simple feature collection with 10922 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 5062.069 ymin: 15886.63 xmax: 52129.87 ymax: 50177.75
Projected CRS: SVY21 / Singapore TM
Now we can plot NKDE maps using tmap.
tmap_mode('view')
pg_nkde_simple_map <- tm_basemap(server = "Esri.WorldTopoMap") +
tm_basemap(server = "Esri.WorldGrayCanvas")+
tm_basemap(server = "OpenStreetMap") +
tm_shape(pg_lixels)+
tm_lines(col="nkde_simple", lwd = 2, palette =c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e"), id="nkde_continuous")+
tm_shape(pg_destination)+
tm_dots(size=0.01)
pg_nkde_discontinuous_map <- tm_basemap(server = "Esri.WorldTopoMap") +
tm_basemap(server = "Esri.WorldGrayCanvas")+
tm_basemap(server = "OpenStreetMap") +
tm_shape(pg_lixels)+
tm_lines(col="nkde_discontinuous", lwd = 2, palette =c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e"), id="nkde_continuous")+
tm_shape(pg_destination)+
tm_dots(size=0.01)
pg_nkde_continuous_map <- tm_basemap(server = "Esri.WorldTopoMap") +
tm_basemap(server = "Esri.WorldGrayCanvas")+
tm_basemap(server = "OpenStreetMap") +
tm_shape(pg_lixels)+
tm_lines(col="nkde_continuous", lwd = 2, palette =c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e"), id="nkde_continuous")+
tm_shape(pg_destination)+
tm_dots(size=0.01)
tmap_arrange(pg_nkde_simple_map, pg_nkde_discontinuous_map, pg_nkde_continuous_map ,ncol=3,nrow=1,sync = TRUE)Comparing Simple, Discontinuous and Continuous NKDE Maps

Upon close examination to the intersection as shown above, we can clearly see the density inflation caused by simple NKDE method, as opposed to discontinuous and continuous methods. A comparison between the Discontinuous and Continuous NKDE maps reveals the disjointed nature of lixels resulting from the Discontinuous method, as indicated by the varying color bands representing each lixel segment. Out of all three maps, continuous NKDE map offers the most intuitive representation.
8.5 Plotting NKDE for Different Planning Areas
In this section, we will create interactive maps of different planning areas that we have previousely identified as potential hotspots. These maps will incorporate several key elements: network kernel density estimation values (depicted through lixels), Grab drop-off points, and points of interest (POIs). The purpose of including these elements is to identify the pull factors within these areas. Understanding these factors can help in planning and decision-making processes, such as where to locate new facilities or how to improve transportation routes.